We are now entering the phase of doing serious statistics: we run regression models, inspect them, plot, and tabulate them.

For this exercise, let’s pretend to be researchers interested in the determinants of the subjective risk of being treated for COVID-19 in a hospital. We suspect that age plays an important role, something we have already assessed by visualizing a similar relationship. So… we need the GESIS Panel COVID-19 survey data again:

library(dplyr)
library(haven)
library(sjlabelled)

gp_covid <-
  read_sav(
    "../data/ZA5667_v1-1-0.sav"
  ) %>% 
  set_na(na = c(-1:-99, 97)) %>% 
  mutate(
    likelihood_hospital = hzcy003a,
    age_cat = as.factor(age_cat)
  ) %>% 
  remove_all_labels()

We start our analysis by checking the relationship with an ANOVA. We also include some very essential covariates: gender, political orientation, and marital status.

1

Run an ANOVA on the proposed relationship between the subjective likelihood of treatment in a hospital and age. Also, include all the covariates.
We had a similar analysis before… you can simply add the control variables with +.

Alright, there’s definitely something in the data we should investigate. Oftentimes this is considered bad practice, but we could try to create a median split for the dependent variable and run a logistic regression.

2

Create a median split for the variable likelihood_hospital and run a logistic regression with all variables. What do you see in the results?
You can create new variables fairly easily with the function dplyr::mutate() in combination with the ifelse() function.

Running one model is not enough. It may be that the assumptions for logistic regression are not fulfilled, or a reviewer simply doesn’t like these types of regressions. Instead, she proposes to run a binomial regression but with a cauchit link.

3

Run the same model with the sole difference of using a cauchit link. How would you interpret the different regression coefficients?
Have a look at the help page ?family to see how you can include a cauchit link.

Using different link functions is sometimes done as they provide different model fits. That’s definitely something we should investigate as well.

4

Compare both regression models with an ANOVA. Use the option test = "LRT" to perform a likelihood ratio test. What’s your interpretation?
A p-value considered as statistically significant would indicate a difference between the models.